7 research outputs found

    Graph Convolutional Networks for Predictive Healthcare using Clinical Notes

    Get PDF
    ν•™μœ„λ…Όλ¬Έ (석사) -- μ„œμšΈλŒ€ν•™κ΅ λŒ€ν•™μ› : κ³΅κ³ΌλŒ€ν•™ 컴퓨터곡학뢀, 2020. 8. κΉ€μ„ .Clinical notes in Electronic Health Record(EHR) system are recorded in free text forms with different styles and abbreviations of personal preference. Thus, it is very difficult to extract clinically meaningful information from EHR clinical notes. There are many computational methods developed for tasks such as medical text normalization, medical entity extraction, and patient-level prediction tasks. Existing methods for the patient-level prediction task focus on capturing the contextual or sequential information from clinical texts, but they are not designed to capture global and non-consecutive information in the clinical texts. Recently, graph convolutional neural networks(GCNs) are successfully used for text-based classification since GCN can extract the global and long-distance information among the whole texts. However, application of GCN for mining clinical notes is yet to be fully explored. In this study, we propose an end-to-end framework for the analysis of clinical notes using graph neural network-based techniques to predict whether a patient is with MRSA (Methicillin-Resistant Staphylococcus Aureus) positive infection or negative infection. For this MRSA infection prediction, it is critical to capture the patient-specific and global non-consecutive information from patient clinical notes. The clinical notes of a patient are processed to construct a patient-level graph, and each patient-level graph is fed into the GCN-based framework for graph-level supervised learning. The proposed framework consists of a graph convolutional network layer, a graph pooling layer, and a readout layer, followed by a fully connected layer. We tested various settings of the GCN-based framework with various combinations of graph convolution operations and graph pooling methods and we evaluated the performance of each variant framework. In experiments with MRSA infection data, all of the variant frameworks with graph structure information outperformed several baseline methods without using graph structure information with a margin of 2.93%∼11.81%. We also investigated graphs in the pooling step to conduct interpretable analysis in population-based statistical and patient-specific aspects, respectively. With this inspection, we found long-distance word pairs that are distinct for MRSA positive patients and we also showed the pooled graph of the patient that contributes to the patient-specific prediction. Moreover, the Adaboost algorithm was used to improve the performance further. As a result, the framework proposed in this paper reached the highest performance of 85.70%, which is higher than the baseline methods with a margin of 3.71%∼12.59%.μ „μž 건강 기둝은 디지털 ν˜•νƒœλ‘œ μ²΄κ³„μ μœΌλ‘œ μˆ˜μ§‘λœ ν™˜μžμ˜ 건강 정보닀. μ „μž 건강 기둝이 ν™˜μžμ˜ μƒνƒœλ₯Ό ν‘œν˜„ ν•˜λŠ” λ‹¨μ–΄λ“€λ‘œ κ΅¬μ„±λœ λ¬Έμ„œμ˜ μ§‘ν•©μ΄κΈ°λ•Œλ¬Έμ— μžμ—°μ–΄ 처리 뢄야에 μ μš©λ˜λŠ” λ‹€μ–‘ν•œ κΈ°κ³„ν•™μŠ΅μ  방법듀이 μ μš©λ˜μ–΄μ™”λ‹€. 특히, λ”₯λŸ¬λ‹ 기술의 λ°œμ „μœΌλ‘œ 인해, μ΄λ―Έμ§€λ‚˜ ν…μŠ€νŠΈ λΆ„μ•Όμ—μ„œ ν™œμš© 되던 λ”₯λŸ¬λ‹ 기술 듀이생λͺ…μ •λ³΄λ°μ˜ν•™μ •λ³΄λΆ„μ•Όμ—μ μ°¨μ μš©λ˜κ³ μžˆλ‹€.ν•˜μ§€λ§Œ,κΈ°μ‘΄μ˜μ΄λ―Έμ§€λ‚˜ ν…μŠ€νŠΈλ°μ΄ν„°μ™€λŠ” λ‹€λ₯΄κ²Œ, μ „μž 건강 기둝 λ°μ΄ν„°λŠ” μž‘μ„±μž 및 ν™˜μž 개개인의 μƒνƒœμ— λ”°λΌμ„œ, λ°μ΄ν„°μ˜ ν™˜μž νŠΉμ΄μ„±μ΄ λ†’λ‹€. λ˜ν•œ, μœ μ‚¬ν•œ 의미λ₯Ό μ§€λ‹ˆλŠ” 건강 κΈ°λ‘λ“€κ°„μ˜ 상관관계λ₯Ό κ³ λ €ν•΄μ•Ό ν•  ν•„μš”κ°€μžˆλ‹€. λ³Έμ—°κ΅¬μ—μ„œλŠ” μ „μž 건강 기둝 λ°μ΄ν„°μ˜ ν™˜μžνŠΉμ΄μ„±μ„ κ³ λ €ν•œ κ·Έλž˜ν”„ 기반 λ”₯λŸ¬λ‹ λͺ¨λΈμ„ κ³ μ•ˆν•˜μ˜€λ‹€. ν™˜μžμ˜ μ „μž 건강 기둝 데이터와 의료 λ¬Έμ„œλ“€μ˜ 곡톡 μΆœν˜„ λΉˆλ„λ₯Ό ν™œμš© ν•˜μ—¬ ν™˜μž 특이적 κ·Έλž˜ν”„λ₯Ό μƒμ„±ν•˜μ˜€λ‹€. 이λ₯Ό 기반으둜, κ·Έλž˜ν”„ μ»¨λ³Όλ£¨μ…˜ λ„€νŠΈμ›Œν¬λ₯Ό μ‚¬μš©ν•˜μ—¬ ν™˜μžμ˜ λ³‘λ¦¬ν•™μ μƒνƒœλ₯Όμ˜ˆμΈ‘ν•˜λŠ”λͺ¨λΈμ„κ³ μ•ˆν•˜μ˜€λ‹€. μ—°κ΅¬μ—μ„œ μ‚¬μš©ν•œ λ°μ΄ν„°λŠ” Methicillin-Resistant Staphylococcus Aureus(MRSA) 감염 μ—¬λΆ€λ₯Ό μΈ‘μ •ν•œ 데이터이닀. κ³ μ•ˆν•œ κ·Έλž˜ν”„κΈ°λ°˜ λ”₯λŸ¬λ‹ λͺ¨λΈμ„ 톡해 ν™˜μžμ˜ 내성을 μ˜ˆμΈ‘ν•œ κ²°κ³Ό, κ·Έλž˜ν”„μ •λ³΄λ₯Ό ν™œμš© ν•˜μ§€ μ•Šμ€ κΈ°μ‘΄λͺ¨λΈλ“€ 보닀 2.93%∼11.81% λ›°μ–΄λ‚œμ„±λŠ₯μ„λ³΄μ˜€λ‹€. λ˜ν•œ 해석 κ°€λŠ₯ν•œ 뢄석을 μˆ˜ν–‰ν•˜κΈ° μœ„ν•΄ 풀링 λ‹¨κ³„μ—μ„œ κ·Έλž˜ν”„λ₯Ό μ‘°μ‚¬ν–ˆλ‹€.이λ₯Ό 톡해 MRSA μ–‘μ„± ν™˜μžμ— λŒ€ν•΄ κ΅¬λ³„λ˜λŠ” μž₯거리 λ‹¨μ–΄νŒ¨ν„΄μ„ μ°Ύμ•˜μœΌλ©° ν™˜μžλ³„ μ˜ˆμΈ‘μ— κΈ°μ—¬ν•˜λŠ” ν™˜μžμ˜ 합동 κ·Έλž˜ν”„λ₯Ό 보여 μ£Όμ—ˆλ‹€. μ„±λŠ₯을 λ”μš± ν–₯μƒμ‹œν‚€κΈ° μœ„ν•΄ μ•„λ‹€λΆ€μŠ€νŠΈ μ•Œκ³ λ¦¬μ¦˜μ„ μ‚¬μš©ν•˜μ˜€λ‹€. λ³Έ λ…Όλ¬Έμ—μ„œ μ œμ•ˆλœ κ²°κ³ΌλŠ” 85.70%둜 κ°€μž₯ 높은 μ„±λŠ₯을 κΈ°λ‘ν–ˆμœΌλ©°, μ΄λŠ” κΈ°μ‘΄ λͺ¨λΈλ³΄λ‹€ 3.71%∼12.59%의 ν–₯상 μ‹œμΌ°μŒμ„ λ³΄μ—¬μ£Όμ—ˆλ‹€.Chapter 1 Introduction 1 1.1 Background 1 1.1.1 EHR Clinical Text Data 1 1.1.2 Current methods and limitations 3 1.2 Problem Statement and Contributions 4 Chapter 2 Related Works 6 2.1 Traditional Methods 6 2.2 Deep Learning Methods 7 2.3 Graph Neural Networks 8 2.3.1 Graph Convolutional Networks 8 2.3.2 Graph Pooling Methods 9 2.3.3 Applications of GNN 10 Chapter 3 Methods and Materials 12 3.1 Notation and Problem Definition 12 3.2 Patient Graph Construction Process 14 3.2.1 Parsing and Filtering 15 3.2.2 Word Co-occurrence Finding 16 3.2.3 Patient-level Graph Representation 16 3.3 Word Embedding 17 3.4 Model Architecture 18 3.4.1 Graph Convolutional Network layer 19 3.4.2 Graph Pooling layer 22 3.4.3 Readout Layer 24 3.5 Prediction and Loss Function 25 3.6 Adaboost algorithm 25 Chapter 4 Experiments 27 4.1 EHR Dataset 27 4.1.1 Introduction to MIMIC-III Dataset 27 4.1.2 MRSA Data Collection 28 4.2 Hyper Parameter Settings 28 4.2.1 Model Training 29 4.3 Baseline Models 30 Chapter 5 Results 32 5.1 Performance Comparisons with baseline models 32 5.2 Performance Comparisons with graph networks 33 5.3 Interpretable analysis 34 5.4 Adaboost Result 38 Chapter 6 Conclusion 40 ꡭ문초둝 49 κ°μ‚¬μ˜ κΈ€ 50Maste

    SPGP: Structure Prototype Guided Graph Pooling

    Full text link
    While graph neural networks (GNNs) have been successful for node classification tasks and link prediction tasks in graph, learning graph-level representations still remains a challenge. For the graph-level representation, it is important to learn both representation of neighboring nodes, i.e., aggregation, and graph structural information. A number of graph pooling methods have been developed for this goal. However, most of the existing pooling methods utilize k-hop neighborhood without considering explicit structural information in a graph. In this paper, we propose Structure Prototype Guided Pooling (SPGP) that utilizes prior graph structures to overcome the limitation. SPGP formulates graph structures as learnable prototype vectors and computes the affinity between nodes and prototype vectors. This leads to a novel node scoring scheme that prioritizes informative nodes while encapsulating the useful structures of the graph. Our experimental results show that SPGP outperforms state-of-the-art graph pooling methods on graph classification benchmark datasets in both accuracy and scalability.Comment: 18 pages, 6 figure

    Clinical Note Owns its Hierarchy: Multi-Level Hypergraph Neural Networks for Patient-Level Representation Learning

    Full text link
    Leveraging knowledge from electronic health records (EHRs) to predict a patient's condition is essential to the effective delivery of appropriate care. Clinical notes of patient EHRs contain valuable information from healthcare professionals, but have been underused due to their difficult contents and complex hierarchies. Recently, hypergraph-based methods have been proposed for document classifications. Directly adopting existing hypergraph methods on clinical notes cannot sufficiently utilize the hierarchy information of the patient, which can degrade clinical semantic information by (1) frequent neutral words and (2) hierarchies with imbalanced distribution. Thus, we propose a taxonomy-aware multi-level hypergraph neural network (TM-HGNN), where multi-level hypergraphs assemble useful neutral words with rare keywords via note and taxonomy level hyperedges to retain the clinical semantic information. The constructed patient hypergraphs are fed into hierarchical message passing layers for learning more balanced multi-level knowledge at the note and taxonomy levels. We validate the effectiveness of TM-HGNN by conducting extensive experiments with MIMIC-III dataset on benchmark in-hospital-mortality prediction.Comment: ACL 2023 Main Conferenc

    Sparse Structure Learning via Graph Neural Networks for Inductive Document Classification

    No full text
    Recently, graph neural networks (GNNs) have been widely used for document classification. However, most existing methods are based on static word co-occurrence graphs without sentence-level information, which poses three challenges:(1) word ambiguity, (2) word synonymity, and (3) dynamic contextual dependency. To address these challenges, we propose a novel GNN-based sparse structure learning model for inductive document classification. Specifically, a document-level graph is initially generated by a disjoint union of sentence-level word co-occurrence graphs. Our model collects a set of trainable edges connecting disjoint words between sentences, and employs structure learning to sparsely select edges with dynamic contextual dependencies. Graphs with sparse structure can jointly exploit local and global contextual information in documents through GNNs. For inductive learning, the refined document graph is further fed into a general readout function for graph-level classification and optimization in an end-to-end manner. Extensive experiments on several real-world datasets demonstrate that the proposed model outperforms most state-of-the-art results, and reveal the necessity to learn sparse structures for each document

    DRPreter: Interpretable Anticancer Drug Response Prediction Using Knowledge-Guided Graph Neural Networks and Transformer

    No full text
    Some of the recent studies on drug sensitivity prediction have applied graph neural networks to leverage prior knowledge on the drug structure or gene network, and other studies have focused on the interpretability of the model to delineate the mechanism governing the drug response. However, it is crucial to make a prediction model that is both knowledge-guided and interpretable, so that the prediction accuracy is improved and practical use of the model can be enhanced. We propose an interpretable model called DRPreter (drug response predictor and interpreter) that predicts the anticancer drug response. DRPreter learns cell line and drug information with graph neural networks; the cell-line graph is further divided into multiple subgraphs with domain knowledge on biological pathways. A type-aware transformer in DRPreter helps detect relationships between pathways and a drug, highlighting important pathways that are involved in the drug response. Extensive experiments on the GDSC (Genomics of Drug Sensitivity and Cancer) dataset demonstrate that the proposed method outperforms state-of-the-art graph-based models for drug response prediction. In addition, DRPreter detected putative key genes and pathways for specific drug-cell-line pairs with supporting evidence in the literature, implying that our model can help interpret the mechanism of action of the drug.Y

    Exploring chemical space for lead identification by propagating on chemical similarity network

    No full text
    Motivation: Lead identification is a fundamental step to prioritize candidate compounds for downstream drug discovery process. Machine learning (ML) and deep learning (DL) approaches are widely used to identify lead compounds using both chemical property and experimental information. However, ML or DL methods rarely consider compound similarity information directly since ML and DL models use abstract representation of molecules for model construction. Alternatively, data mining approaches are also used to explore chemical space with drug candidates by screening undesirable compounds. A major challenge for data mining approaches is to develop efficient data mining methods that search large chemical space for desirable lead compounds with low false positive rate. Results: In this work, we developed a network propagation (NP) based data mining method for lead identification that performs search on an ensemble of chemical similarity networks. We compiled 14 fingerprint-based similarity networks. Given a target protein of interest, we use a deep learning-based drug target interaction model to narrow down compound candidates and then we use network propagation to prioritize drug candidates that are highly correlated with drug activity score such as IC50. In an extensive experiment with BindingDB, we showed that our approach successfully discovered intentionally unlabeled compounds for given targets. To further demonstrate the prediction power of our approach, we identified 24 candidate leads for CLK1. Two out of five synthesizable candidates were experimentally validated in binding assays. In conclusion, our framework can be very useful for lead identification from very large compound databases such as ZINC

    Glycyrrhizic Acid Mitigates Tripterygium-Glycoside-Tablet-Induced Acute Liver Injury via PKM2 Regulated Oxidative Stress

    No full text
    Tripterygium glycoside tablet (TGT), as a common clinical drug, can easily cause liver damage due to the narrow therapeutic window. Glycyrrhizic acid (GA) has a hepatoprotective effect, but the characteristics and mechanism of GA’s impact on TGT-induced acute liver injury by regulating oxidative stress remain unelucidated. In this study, TGT-induced acute liver injury models were established in vitro and in vivo. The levels of alanine aminotransferase (ALT), aspartate aminotransferase (AST), alkaline phosphatase (AKP), superoxide dismutase (SOD), malondialdehyde (MDA), glutathione (GSH), catalase (CAT), lactate dehydrogenase (LDH), tumor necrosis factor-α (TNF-α), interleukin-1β (IL-1β) and interleukin-6 (IL-6) were quantified. The anti-apoptotic effect of GA was tested using flow cytometry. Potential target proteins of GA were profiled via activity-based protein profiling (ABPP) using a cysteine-specific (IAA-yne) probe. The results demonstrate that GA markedly decreased the concentrations of ALT, AST, AKP, MDA, LDH, TNF-α, IL-1β and IL-6, whereas those of SOD, GSH and CAT increased. GA could inhibit TGT-induced apoptosis in BRL-3A cells. GA bound directly to the cysteine residue of PKM2. The CETSA and enzyme activity results validate the specific targets identified. GA could mitigate TGT-induced acute liver injury by mediating PKM2, reducing oxidative stress and inflammation and reducing hepatocyte apoptosis
    corecore